Accelerating Spherical k-Means

نویسندگان

چکیده

Spherical k-means is a widely used clustering algorithm for sparse and high-dimensional data such as document vectors. While several improvements accelerations have been introduced the original algorithm, not all easily translate to spherical variant: Many acceleration techniques, algorithms of Elkan Hamerly, rely on triangle inequality Euclidean distances. However, uses Cosine similarities instead distances computational efficiency. In this paper, we incorporate Hamerly working directly with Cosines obtain substantial speedup evaluate these real data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Title Spherical K-means Clustering

October 21, 2009 Type Package Title Spherical k-Means Clustering Version 0.1-2 Author Kurt Hornik, Ingo Feinerer, Martin Kober Maintainer Kurt Hornik Description Algorithms to compute spherical k-means partitions. Features several methods, including a genetic and a simple fixed-point algorithm and an interface to the CLUTO vcluster program. License GPL-2 Imports slam...

متن کامل

Accelerating Lloyd’s Algorithm for k-Means Clustering

The k-means clustering algorithm, a staple of data mining and unsupervised learning, is popular because it is simple to implement, fast, easily parallelized, and offers intuitive results. Lloyd’s algorithm is the standard batch, hill-climbing approach for minimizing the k-means optimization criterion. It spends a vast majority of its time computing distances between each of the k cluster center...

متن کامل

Package 'skmeans' Title Spherical K-means Clustering

Description Algorithms to compute spherical k-means partitions. Features several methods, including a genetic and a fixed-point algorithm and an interface to the CLUTO vcluster program.

متن کامل

K-Means for Spherical Clusters with Large Variance in Sizes

Data clustering is an important data exploration technique with many applications in data mining. The k-means algorithm is well known for its efficiency in clustering large data sets. However, this algorithm is suitable for spherical shaped clusters of similar sizes and densities. The quality of the resulting clusters decreases when the data set contains spherical shaped with large variance in ...

متن کامل

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2021

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-89657-7_17